Semantic Re-ranking in Ad-hoc Robust Retrieval

نویسندگان

Pierpaolo Basile

Annalina Caputo

Giovanni Semeraro

چکیده

This paper proposes an investigation about a re-ranking strategy presented at SIGIR 2010. In that work we describe a re-ranking strategy in which the output of a semantic based IR system is used to re-weigh documents by exploiting inter-document similarities computed on a vector space. The space is built using the Random Indexing technique. The effectiveness of the strategy has been evaluated in the context of the CLEF Ad-Hoc Robust-WSD Task, while in this paper we propose new experiments in the TREC Ad-Hoc Robust Track 2004. 1 Background and Motivation A general approach to overcome the word ambiguity problem in IR involves the representation of documents by word meanings. Among the most investigated techniques are those that rely on WordNet synsets through which groups of synonym words are uniquely identified and linked to each other by semantic relations. The Robust-WSD task at Cross Language Evaluation Forum (CLEF) [1] has shown that results improve when aggregation strategies are exploited. The method proposed in [6] describes a different approach to document aggregation based on a variation of the “inter-document similarities” [8] idea. The method combines two retrieval strategies that work at two different representation levels: keyword and synset. The ranked list of documents retrieved using the synsetbased representation (synset list) is exploited to re-rank the list of documents retrieved using the keyword-based one (keyword list). The insight of this method is that documents in the keyword list with the highest number of similar documents in the synset list should climb in the result set. The approach tries to re-weigh documents in response to a query by promoting those documents with the highest number of supporters. In this context, a supporter is a document with content similar to the target one. Inter-document similarities is computed relying on the Random Index technique to build a vector space in which similar documents are represented close. Let us denote by Lk and Ls the ranked lists of documents retrieved using keywords and synsets representation, respectively. The idea behind our re-ranking method is to give more evidence to the documents in Lk that are widely supported by similar documents occurring in both lists. The method requires the following steps: 1 A semantic lexicon for the English language. 2 Pierpaolo Basile, Annalina Caputo, and Giovanni Semeraro 1. For each document di ∈ Lk compute the supporters(di, α), which is the set of α documents {d1, ...dα} ⊂ Lk with the highest inter-document similarity to di. 2. Get the overlap supporters = {dj ∈ Ls : dj ∈ supporters(di, α)} which is the set of documents occurring in both Ls and supporters. 3. Assign to di a new score S(di) taking into account supporting documents computed in the step 2. Formally: S(di) = θ ∗ Ssupporters + (1− θ) ∗ Sk(di) (1) where Ssupporters = ∑ dj∈overlap supporters Sk(dj) ∗ Ss(dj) (2) and Sk(dj) is the score of dj in Lk, while Ss(dj) is the score of dj in Ls, and θ is a free parameter used to smooth Ssupporters, which denotes the scores combination of supporting documents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Estimating Semantic Similarity between Expanded Query and Tweet Content for Microblog Retrieval

This paper reports the systems we submitted to the Microblog Track shared in TREC 2014 which focuses on ad hoc retrieval (i.e., retrieving top 1, 000 relevant tweet for every given topic). To address this task, we adopted a two-stage framework, i.e., firstly, we performed query expansion (i.e., expanding relevant inforamtion using pseudorelevance feedback and Google search engine results) to re...

متن کامل

A Re-Ranking Method Based on Irrelevant Documents in Ad-Hoc Retrieval

In this paper, we propose a novel approach for document re-ranking, which relies on the concept of negative feedback represented by irrelevant documents. In a previous paper, a pseudo-relevance feedback method is introduced using an absorbing document d̃ which best fits the user’s need. The document d̃ is orthogonal to the majority of irrelevant documents. In this paper, this document is used to ...

متن کامل

A Novel Re-ranking Approach Inspired by Quantum Measurement

Quantum theory (QT) has recently been employed to advance the theory of information retrieval (IR). A typical method, namely the Quantum Probability Ranking Principle (QPRP), was proposed to re-rank top retrieved documents by considering the inter-dependencies between documents through the “quantum interference”. In this paper, we attempt to explore another important QT concept, namely the “qua...

متن کامل

BBN at TREC Using Hidden Markov Models for Information Retrieval

We present a new method for information retrieval using hidden Markov models HMMs and relate our experience with this system on the TREC ad hoc task We develop a general framework for incorporat ing multiple word generation mechanisms within the same model We then demonstrate that an extremely simple realization of this model substantially outper forms tf idf ranking on both the TREC and TREC a...

متن کامل

Ranking Function Discovery by Genetic Programming for Robust Retrieval

Ranking functions are instrumental for the success of an information retrieval (search engine) system. However nearly all existing ranking functions are manually designed based on experience, observations and probabilistic theories. This paper tested a novel ranking function discovery technique proposed in [Fan 2003a, Fan2003b] – ARRANGER (Automatic geneRation of RANking functions by GEnetic pR...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Semantic Re-ranking in Ad-hoc Robust Retrieval

نویسندگان

چکیده

منابع مشابه

Estimating Semantic Similarity between Expanded Query and Tweet Content for Microblog Retrieval

A Re-Ranking Method Based on Irrelevant Documents in Ad-Hoc Retrieval

A Novel Re-ranking Approach Inspired by Quantum Measurement

BBN at TREC Using Hidden Markov Models for Information Retrieval

Ranking Function Discovery by Genetic Programming for Robust Retrieval

عنوان ژورنال:

اشتراک گذاری